Market Research Analysis Using ChatGPT
Learn how to use ChatGPT to perform analysis on market research results effectively.
In this lesson, we'll focus on transforming information into actionable insights. For this purpose, we created a survey questionnaire using ChatGPT regarding shopping habits and preferences, as shown below:
Shopping Habits and Preferences
-
What is your age range?
a) Under 18
b) 18–24
c) 25–34
d) 35–44
e) 45–54
f) 55–64
g) 65 or over
-
What is your gender?
a) Male
b) Female
c) Non-binary
d) Prefer not to say
-
How often do you shop for groceries in a week?
a) Less than once a week
b) 1–2 times a week
c) 3–4 times a week
d) 5 or more times a week
-
Where do you prefer to shop for groceries?
a) Supermarkets
b) Specialty stores (e.g. organic food stores)
c) Online grocery delivery
d) Farmers’ markets
-
Which of the following factors influences your decision to make a purchase while shopping?
a) Price
b) Quality
c) Brand reputation
d) Convenience
-
What do you like and dislike about shopping in-store versus online?
-
To what extent do you agree with the statement: “I enjoy browsing and discovering new products while shopping”?
a) Strongly disagree
b) Disagree
c) Neutral
d) Agree
e) Strongly agree
-
How likely are you to recommend your favorite store/brand for non-grocery items to a friend or family member?
a) Very unlikely
b) Unlikely
c) Neutral
d) Likely
e) Very likely
-
Do you tend to stick with familiar brands when purchasing non-grocery items, or are you willing to try new brands?
a) I usually stick with familiar brands
b) I sometimes try new brands
c) I often try new brands
d) I almost always try new brands
We'll analyze the data obtained from surveying some individuals using the aforementioned survey. A sample is shown in the following table.
Sample Responses of the Survey
Age Range | 18-24 | 25-34 | 35-44 | 25-34 | 35-44 |
Gender | Female | Female | Male | Female | Male |
Grocery Shopping Frequency | 3-4 times a week | 1-2 times a week | Less than once a week | 1-2 times a week | Less than once a week |
Preferred Shopping Location | Supermarkets | Specialty stores (e.g. organic food stores) | Online grocery delivery | Supermarkets | Supermarkets |
Factors Influencing Purchase Decisions | Price | Quality | Brand reputation | Price | Convenience |
Likes and Dislikes | In-store is faster for me to shop and decide about a product. | In stores we can check the quality in-person | – | Shopping from store is convenient | – |
Enjoy Browsing | Agree | Strongly Agree | Agree | Strongly Agree | Agree |
Recommend to Others | Unlikely | Very Likely | Likely | Neutral | Likely |
Familiar Brands Preference | I sometimes try new brands | I often try new brands | I sometimes try new brands | I sometimes try new brands | I often try new brands |
We are now ready to perform statistical analysis on our data. However, before performing statistical analysis, we need to develop our understanding of it. Let’s explain it in the following section.
Statistical analysis#
Statistical analysis involves collecting, organizing, interpreting, and drawing conclusions from data. In statistical analysis, we have two methods to analyze the data—descriptive and inferential statistics—described below:
Descriptive statistics#
Descriptive statistics provide a concise summary of the survey. This includes techniques like measures of central tendency, measures of dispersion, measures of distribution, percentages, and so on.
Measures of central tendency: This provides us with measures like the dataset’s mean, median, and mode. The mean value represents the arithmetic average of the responses. After sorting all responses in ascending or descending order, the median represents the middle value in responses. The mode defines the most frequent value in our responses. Note that for all these measures, one would need quantitive data points.
Measures of dispersion: This tells us how much our responses deviate from central tendency measures. Typical measures are the range and standard deviation of the dataset. The range tells us the difference between our responses’ maximum and minimum values and measures the total spread. Standard deviation indicates the variability in the collected responses around the mean value, with a higher value suggesting greater dispersion and a lower variance indicating less variability.
Measures of distribution: This provides us with information about the symmetry and characteristics of the distribution of data points in a dataset. A common measure of distribution is percentiles. Percentiles allow us to position a response relative to all collected responses. For example, the 75th percentile denotes the value below which 75% of our responses fall.
Note: Descriptive statistics are numbers that summarize the data.
Let’s use ChatGPT to examine the responses to our survey in light of descriptive statistics:
Prompt: Given the survey questions and results, generate the measures of central tendency and measures of dispersion of each variable provided in the data. |
Let’s calculate the measures of central tendency and dispersion for the variables based on the provided data:
1. Age Range:
- Measures of Central Tendency: Since age range is categorical, we can calculate the mode, which represents the most common age range among the respondents.
- Mode: 35-44
2. Gender:
- Measures of Central Tendency: Since gender is categorical, we can calculate the mode, which represents the most common gender among the respondents.
- Mode: Male
3. Grocery Shopping Frequency:
- Measures of Central Tendency: Since grocery shopping frequency is categorical, we can calculate the mode, which represents the most common shopping frequency among the respondents.
- Mode: 1-2 times a week
4. Preferred Grocery Store:
- Measures of Central Tendency: Since the preferred grocery store is categorical, we can calculate the mode, which represents the most commonly preferred store among the respondents.
- Mode: Supermarkets
5. Purchase Influencing Factors:
- Measures of Central Tendency: Since purchase influencing factors are categorical, we can calculate the mode, which represents the most common influencing factor among the respondents.
- Mode: Quality
6. Shopping Preference:
- Measures of Central Tendency: Since shopping preference is categorical, we can calculate the mode, which represents the most common shopping preference among the respondents.
- Mode: Agree
7. Enjoy Browsing:
- Measures of Central Tendency: Since “Enjoy Browsing” is categorical, we can calculate the mode, which represents the most common response among the respondents.
- Mode: Agree
8. Recommend Store:
- Measures of Central Tendency: Since “Recommend Store” is categorical, we can calculate the mode, which represents the most common response among the respondents.
- Mode: Unlikely
9. Brand Preference:
- Measures of Central Tendency: Since brand preference is categorical, we can calculate the mode, which represents the most common brand preference among the respondents.
- Mode: Sometimes try
For the variables mentioned above, we cannot calculate measures of dispersion such as variance, standard deviation, and range since they are applicable to numerical data.
Similarly, we can also analyze which factors are considered while purchasing or shopping. For example, in our survey, we have mentioned four factors: price, quality, brand reputation, and convenience. Let’s calculate the percentage of each factor using ChatGPT.
Prompt: Show me the percentage of each response for the factors involved in the question: “Which of the following factors influences your decision to make a purchase while shopping” |
Here are the percentages for each response of the question “Which of the following factors influences your decision to make a purchase while shopping?” based on the provided survey data:
- Price: 32.14%
- Quality: 28.57%
- Brand reputation: 14.28%
- Convenience: 25%
Please note that the percentages are rounded to two decimal places and may not add up to exactly 100% due to rounding.
We can also perform relative analysis based on descriptive statistics on multiple variables. For example, let’s see what percentage of males and females shop 1–2 times a week. This prompt involves two variables: gender and frequency of shopping.
Prompt: Deduce from the survey that what percentage of males and females shop 1–2 times a week. |
To calculate the percentage of males and females who shop 1–2 times a week based on the given survey data, we need to consider the total number of males and females in the dataset. From the given data, we have:
- Total number of males: 17
- Total number of females: 7
Now, let’s calculate the percentage:
Percentage of males who shop 1–2 times a week: (17 / 24) * 100 = 70.8%
Percentage of females who shop 1–2 times a week: (7 / 24) * 100 = 29.2%
Therefore, the percentage of males who shop 1–2 times a week is approximately 70.8%, and the percentage of females who shop 1–2 times a week is 29.2%.
Inferential statistics#
Inferential statistics involves drawing conclusions about a population based on sample data. It includes techniques like hypothesis testing, confidence intervals, regression analysis, and analysis of variance (ANOVA) to make inferences and determine the significance of relationships or differences in the data.
Hypothesis testing: Using inferential statistics, we can test a hypothesis and make claims about the population based on sample data. For example, if our survey aims to test the hypothesis that shopping online is preferred by most customers, we can gain evidence to either support or refute this hypothesis.
Confidence intervals: This provides a range of values and a corresponding confidence level within which a population parameter will likely fall. For example, a 95% confidence interval for the mean customer satisfaction score is (4.2, 4.7). Based on the sample data, this range shows that we can be 95% confident that the mean satisfaction score of the actual population is between 4.2 and 4.7.
Regression analysis: Using inferential statistics, we can examine the relationship between variables and make predictions.
Let’s perform inferential statistics on the survey data using ChatGPT. The prompt is given below:
Prompt: Given the survey questions and survey results, generate a report by providing inferential statistics on the results. |
To generate inferential statistics based on the provided survey data, we can perform analysis and draw conclusions about the larger population from which the sample was collected. However, please note that the sample size is relatively small, and the data provided is limited. Nonetheless, we can still provide some insights and observations:
- Preferred Shopping Location:
- Supermarkets are the most preferred shopping location, chosen by 64.71% of respondents. This suggests that supermarkets are a popular choice among the surveyed population.
- Specialty stores, online grocery delivery, and farmers’ markets are less frequently preferred options.
- Factors Influencing Purchase Decisions:
- Quality, brand reputation, and price are identified as significant factors influencing purchase decisions.
- Convenience is also considered an important factor, but to a slightly lesser extent.
- Enjoyment of Browsing:
- A majority of respondents (more than 70%) agree or strongly agree that they enjoy browsing and discovering new products while shopping. This suggests that browsing is an engaging and enjoyable experience for most individuals.
- Recommendation to Others:
- Approximately 57% of respondents are likely or very likely to recommend their favorite store/brand for non-grocery items to a friend or family member. This indicates a positive perception and satisfaction with their preferred store/brand.
- Familiar Brands Preference:
- About 57% of respondents usually stick with familiar brands when purchasing non-grocery items, while around 47% of respondents are willing to try new brands. This suggests a relatively balanced preference between familiarity and willingness to explore new brands.
These observations provide some insights into the preferences and behaviors of the surveyed population. However, it’s important to note that these conclusions are specific to the given sample and may not be representative of the entire population. To draw more accurate and reliable inferences, a larger and more diverse sample would be required.
Visualizing data using graphs#
So far, we have focused on generating reports and summarizing responses. However, an essential component of a market research report involves visualizing the results through charts and graphs.
Let’s use some programming tools to visualize the descriptive statistics provided by the ChatGPT on our collected data. We will focus on the earlier analysis to highlight the relative importance of various factors in shopping decisions.
Bar chart#
The code snippet below demonstrates how we can visualize the percentage distribution of each shopping factor—price, quality, convenience, brand reputation—as provided by ChatGPT in response to the given prompt:
Prompt: Show me the percentage of each response for the factors involved in the question: “Which of the following factors influences your decision to make a purchase while shopping.” |
In the code above, we visualize the percentage of each factor involved in purchasing or shopping, as yielded by ChatGPT. The percentage of each response is shown in the following table:
Factors Percentage During Shopping
Factor | Percentage |
Price | 32.14% |
Quality | 28.57% |
Convenience | 25% |
Brand reputation | 14.28% |
Pie chart#
Using a pie chart, let’s visualize the percentage of males and females who shop 1–2 times a week. Similarly, we can generate the summarized results using the following prompt.
Prompt: Deduce from the survey what percentage of males and females shop 1–2 times a week. |
In the code above, we visualize the percentage of males and females who shop 1–2 times a week. The percentages yielded by ChatGPT are given below:
Frequency of Shopping 1-2 Times a Week
Gender | Percentage |
Male | 70.8% |
Female | 29.2% |
This lesson focuses on transforming information into actionable insights by creating a survey questionnaire using ChatGPT to gather data on shopping habits and preferences. The data is then analyzed using descriptive statistics to summarize the survey results, inferential statistics to draw conclusions about the population, and visualizations such as bar charts and pie charts to present the findings.
Other Market Research Methods
Exercise: Market Research